Real-time implementation of field programmable gate arrays (FPGA) design in hyperspectral imaging

ABSTRACT

A Field Programmable Gate Arrays (FPGA) design uses a Coordinate Rotation DIgital Computer (CORDIC) algorithm that can convert a Givens rotation of a vector to a set of shift-add operations. The CORDIC algorithm can be easily implemented in hardware architecture, therefore in FPGA. Since the computation of the inverse of the data correlation matrix involves a series of Givens rotations, the utility of the CORDIC algorithm allows a causal Constrained Energy Minimization (CEM) to perform real-time processing in FPGA. An FPGA implementation of the causal CEM is described and its detailed architecture is also described.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/482,150, filed on Jun. 24, 2003 in the United States Patent andTrademark Office, the disclosure of which is hereby incorporated byreference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a Field Programmable Gate Arrays (FPGA)design and implementation, and more particularly to utilizing aCoordinate Rotation DIgital Computer (CORDIC) algorithm that allows acausal Constrained Energy Minimization (CEM) to perform real-timeprocessing in FPGA for hyperspectral processing.

2. Description of the Related Art

The importance of real-time processing has been recently realized andrecognized in many applications. In some applications, e.g. on-boardspacecraft data processing system, it is very useful to have high levelsof processing throughput. Specifically, as spacecraft instrumentsgenerate data at an ever-increasing speed, on-board science dataprocessing has been largely absent from remote sensing missions. Manyadvantages can result from real-time processing. One is the detection ofmoving targets. This is critical and crucial on a battlefield whenmoving targets such as tanks or missile launching vehicles pose a realthreat to ground troops. The real-time data processing provides timelyintelligence information which can help to reduce casualties. Anotheradvantage is on-board data processing. For space-borne satellites,real-time data processing can significantly reduce mass storage of datavolume. A third advantage is chip design. Real-time processing can beimplemented in parallel and reduce computational loads. Furthermore, itcan also reduce payload in aircrafts and satellites. Over the pastyears, many subpixel detection and mixed pixel algorithms have beendeveloped and shown to be very versatile. However, their applicabilityto real-time processing problem is generally restricted by very complexcomputational workloads.

SUMMARY OF THE INVENTION

The present invention addresses the aforementioned problems. An objectof the present invention is to provide a system and method capable ofperforming real-time processing of Constrained Energy Minimization(CEM).

Another object of the present invention is to provide a system andmethod for providing Field Programmable Gate Arrays (FPGA) utilizing aCoordinate Rotation Digital Computer (CORDIC) algorithm the allows acausal CEM to perform real-time processing in FPGA.

A further object of the present invention is to utilize the CORDIC toconvert Givens rotations to a series of shifts-adds to use to decomposea matrix into triangular matrices that can be implemented in real time.

To achieve these and other advantages and in accordance with the purposeof the present invention, as embodied and broadly described herein,there is provided an exemplary embodiment of a method of implementingfield programmable gate arrays comprising: converting a Givens rotationof a vector to a set of operations; wherein a coordinate rotationdigital computer algorithm is used for said converting.

Consistent with another aspect of the present invention, there isprovided another exemplary embodiment of a method of implementing fieldprogrammable gate arrays for hyperspectral detection and classificationcomprising: applying a coordinate rotation digital computer algorithm toreceived data to generate an upper triangular matrix by Givens rotation;applying backsubstitution to said generated upper triangular matrix toobtain an inverse of said upper triangular matrix; and applying finiteimpulse response filtering to current input pixel streams to generate anoutput; wherein said generated output indicates a detected targetsignature.

Consistent with a still further aspect of the present invention, thereis provided another exemplary embodiment of a method of and an apparatusfor implementing field programmable gate arrays for hyperspectraldetection and classification, comprising: performing Givens rotation onan input pixel stream to generate an upper triangular matrix; applyingbacksubstitution to said generated upper triangular matrix to obtain aninverse of said upper triangular matrix; transposing said inverse ofsaid upper triangular matrix using distributed arithmetic; obtaining afilter weight vector; and applying finite impulse response filtering tocurrent input pixel streams to generate an output; wherein a coordinaterotation digital computer algorithm is used to perform the Givensrotation by a series of operations, and said generated output indicatesa detected target signature.

Consistent with a still further aspect of the present invention, thereis provided another exemplary embodiment of a method of and an apparatusfor implementing field programmable gate arrays for hyperspectraldetection and classification comprising: generating a correlation matrixR from an input pixel stream; applying a coordinate rotation digitalcomputer algorithm to convert a matrix [R|d] to an upper triangularmatrix by Givens rotation, where d is a desired target signature;applying backsubstitution to said upper triangular matrix to obtainfilter coefficients; and applying finite impulse response filtering tocurrent input pixel streams to generate an output; wherein saidcorrelation matrix is updated every time a new pixel arrives, and saidgenerated output indicates a detected target signature.

The foregoing and other objects, features, aspects and advantages of thepresent invention will become more apparent from the following detaileddescription of the present invention when taken in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features and advantages of the presentinvention will become apparent from the following description ofexemplary embodiments given in conjunction with the accompanyingdrawings, in which:

FIG. 1 illustrates a systolic array for QR decomposition consistent withan embodiment of the present invention;

FIG. 2 illustrates a systolic array for backsubstitution consistent withan embodiment of the present invention;

FIG. 3( a) illustrates a boundary cell of systolic array consistent withan embodiment of the present invention;

FIG. 3( b) illustrates an internal cell systolic array consistent withan embodiment of the present invention;

FIG. 4 illustrates a shift-adder DA Architecture consistent with anembodiment of the present invention;

FIG. 5 illustrates a computation of w_(k) consistent with an embodimentof the present invention;

FIG. 6 illustrates a FIR filter for abundance estimation consistent withan embodiment of the present invention;

FIG. 7 illustrates a block diagram of the auto-correlator consistentwith another embodiment of the present invention;

FIG. 8 illustrates a QR-decomposition by CORDIC circuit consistent withanother embodiment of the present invention;

FIG. 9 illustrates a systolic array for backsubstitution consistent withanother embodiment of the present invention;

FIG. 10 a illustrates a boundary cell implementation for systolic arrayconsistent with another embodiment of the present invention;

FIG. 10 b illustrates an internal cell implementation for systolic arrayconsistent with another embodiment of the present invention;

FIGS. 11( a)-(h) illustrate real-time updated triangular matrixes forvarying lines of pixel streams via CORDIC circuit;

FIGS. 12( a)-(h) illustrate real-time updated weights for varying linesof pixel streams;

FIGS. 13( a)-(h) illustrate real-time detection result for varying linesof pixel streams;

FIG. 14( a) illustrates a block diagram of a FPGA design of CEMconsistent with an embodiment of the present invention;

FIG. 14( b) illustrates a block diagram of a FPGA design of CEMconsistent with another embodiment of the present invention;

FIG. 15 illustrates a block diagram of a FPGA design of OSP consistentwith a further embodiment of the present invention;

FIG. 16 illustrates a block diagram of a FPGA design of CRXD consistentwith yet a further embodiment of the present invention;

FIG. 17( a) illustrates a block diagram of a stationary UKFLU approachconsistent with yet a further embodiment of the present invention; and

FIG. 17( b) illustrates a block diagram of a dynamic UKFLU approachconsistent with yet a further embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Various embodiments of the present invention use a Field ProgrammableGate Arrays (FPGA) design for real-time implementation of ahyperspectral detection and classification algorithm, called constrainedenergy minimization (CEM), for hyperspectral data exploitation. CEM isdiscussed by J. C. Harsanyi in the dissertation entitled Detection andClassification of Subpixel Spectral Signatures in Hyperspectral ImageSequences. The issue of the real-time processing for CEM was alsostudied in the article “Real-time processing algorithms for targetdetection and classification in hyperspectral imagery,” by C.-I Chang,H. Ren and S. S. Chiang, where its systolic array implementation wasdeveloped. In recent years, rapid advances in VLSI technology have had alarge impact on modern digital signal processing. Over the past thirtyyears, the number of transistors per chip has doubled about once a year.Therefore, VLSI design of complex algorithms have become more and morefeasible. The major difficulty with implementing these algorithms inreal time is the computation of the inverse of a data correlation matrixperformed in the CEM, which requires a complete set of data samples. Inorder to cope with this problem, the data correlation matrix must becalculated in a causal manner which only needs data samples up to thesample at the time it is processed. Systolic arrays provide apossibility, but requires a series of Givens rotations to decompose amatrix into triangular matrices that can be implemented in real time.Unfortunately, such Givens rotations cannot be realized in hardware. Inorder to resolve this issue, the Givens rotations must be performed byoperations such as adds, ORs, XORs, shifts that can be realized inhardware architecture. In order to do so, the COordinate RotationDIgital Computer (CORDIC) algorithm developed by J. E. Volder “TheCORDIC trigonometric computer technique,” which allows the conversion ofa Givens rotation to a series of shifts-adds operations, is used. Usingsystolic arrays architecture in conjunction with the CORDIC algorithm,the computation of a matrix inverse can be implemented in a set ofshifts-adds operations. As a result, the FPGA design of CEM is madepossible, leading to implementation for chip fabrication. The detailedFPGA design layout for a causal CEM will be described in detail.

Assume that a remotely sensed image is a collection of image pixelsdenoted by {r₁, r₂, . . . , r_(N)} where r_(i)=(r_(i1), r_(i2), . . .r_(iL))^(T) for 1≦i≦N is an L-dimensional pixel vector, N is the totalnumber of pixels in the image, and L is the total number of spectralchannels. Suppose that d=(d₁, . . . , d_(L))^(T) is the desired targetsignature of interest in the image. The goal is to design a finiteimpulse response (FIR) linear filter specified by L filter coefficients{w₁, w₂, . . . w_(L)}, denoted by an L-dimensional vector w=(w₁, . . . ,w_(L))^(T) that can be used to detect the signature d without knowingthe image background. Assuming that y_(i) is the output of the designedFIR filter resulting from the input r_(i), then y_(i) can be expressedbyy _(i)=Σ_(l=1) ^(L) w _(l) r _(il) =r _(i) ^(T) w=w ^(T) r _(i).  (1)

In order to detect the desired target signature d using the filteroutput y_(i), the FIR filter must be constrained by the followingequationd ^(T) w=1, i.e, d ^(T) w=Σ _(l=1) ^(L) d _(l) w _(l)=1 for 1≦j≦k  (2)so that the d can pass through the filter, while the output energiesresulting from other signatures will be minimized. This problem is alinearly constrained adaptive beamforming problem, called minimumvariance distortionless response (MVDR) which can be cast as follows:min_(w){w^(T)R_(L×L)w} subject to the constraint d^(T)w=1  (3)where R_(L×L)=(1/N)[Σ_(i=1) ^(N)r_(i)r_(i) ^(T)] is the autocorrelationsample matrix of the image and

$\begin{matrix}{{\frac{1}{N}\left\lbrack {\sum\limits_{i = 1}^{N}\; y_{i}^{2}} \right\rbrack} = {{\frac{1}{N}\left\lbrack {\sum\limits_{i = 1}^{N}\;{\left( {r_{i}^{T}w} \right)^{T}r_{i}^{T}w}} \right\rbrack} = {{{w^{T}\left( {\frac{1}{N}\left\lbrack {\sum\limits_{i = 1}^{N}\;{r_{i}r_{i}^{T}}} \right\rbrack} \right)}w} = {w^{T}R_{L \times L}{w.}}}}} & (4)\end{matrix}$The solution to Eq. (4) can then be obtained, such that

$\begin{matrix}{w^{*} = {\frac{R_{L \times L}^{- 1}d}{d^{T}R_{L \times L}^{- 1}d}.}} & (5)\end{matrix}$The filter specified by the optimal weight vector w* is the constrainedenergy minimization (CEM).

One significant advantage of the CEM is that the correlation matrixR_(L×L) in the optimal weights specified by Eq. (5) can be decomposedinto a product of a unitary matrix Q and an upper triangular matrix R byeither the Givens rotations or the Householder transform. Such adecomposition is commonly referred to as QR-decomposition. Anotheradvantage is that the CEM can be implemented in real time where thecorrelation matrix can be carried out either line-by-line orpixel-by-pixel from left to right and top to bottom. For illustrativepurpose, it is assumed that the correlation matrix is performedline-by-line and at each line t a data matrix X_(t)=[r_(t1), r_(t2), . .. r_(tN)] is formed up to this particular line. In this case, theR_(L×L) in Eq. (3) is replaced by the data autocorrelation matrix ofline t in the image, denoted by Σ_(t), whereby

$\begin{matrix}{\sum\limits_{t}^{\;}\;{= {{\frac{1}{N}\left\lbrack {\sum\limits_{i = 1}^{N}\;{r_{ti}r_{ti}^{T}}} \right\rbrack} = {{\frac{1}{N}\left\lbrack {X_{t}X_{t}^{T}} \right\rbrack}.}}}} & (6)\end{matrix}$With QR-decomposition, X_(t) can be expressed byX _(r) =Q _(t) R _(t).  (7)

Here, Q_(t) is a unitary matrix with Q_(t) ⁻¹=Q_(t) ^(T) and

$R_{t} = \begin{bmatrix}R_{t}^{upper} \\0\end{bmatrix}$is not necessarily of full rank, where 0 is a zero vector and

$\begin{matrix}{R_{t}^{upper} = \begin{bmatrix}* & * & \ldots & * \\0 & * & \ldots & * \\\vdots & ⋰ & ⋰ & \vdots \\0 & \ldots & 0 & *\end{bmatrix}} & (8)\end{matrix}$is an upper triangular matrix, and * in R_(t) ^(upper) is a nonzeroelement. From Eq. (6), the inverse of Σ_(t) can be computed asΣ_(t) ⁻¹ =N(XX ^(T))=N{(R _(t) ^(upper))⁻¹[(R _(t)^(upper))^(T)]⁻¹}  (9)where the unitary matrix Q_(t) is canceled out in Eq. (6) because Q_(t)⁻¹=Q_(t) ^(T). Substituting Σ_(t) ⁻¹ in Eq. (9) for R_(L×L) ⁻¹ in (5)yieldsw*={(R _(t) ^(upper))⁻¹[(R _(t) ^(upper))^(T)]⁻¹ }·d(d ^(T){(R _(t)^(upper))⁻¹[(R _(t) ^(upper))^(T)]⁻¹ }d)⁻¹.  (10)Since R_(t) ^(upper) is an upper triangular matrix, so is (R_(t)^(upper))⁻¹. Therefore, Eq. (10) does not require computation of R_(L×L)⁻¹. As a result, it can be implemented in real time processing. In anexemplary embodiment, Givens Rotation is used to performQR-decomposition.

In a first embodiment of the invention for CEM implementation, Eq. (5)is decomposed into several components, each component implemented by aseparated hardware module, whereby

$\begin{matrix}{w^{*} = {\frac{R_{L \times L}^{- 1}d}{d^{T}R_{L \times L}^{- 1}d} = {\frac{\left( {XX}^{T} \right)^{- 1}d}{{d^{T}\left( {XX}^{T} \right)}^{- 1}d} = {\frac{{\left( R_{t}^{upper} \right)^{- 1}\left\lbrack \left( R_{t}^{upper} \right)^{T} \right\rbrack}^{- 1}d}{{{d^{T}\left( R_{t}^{upper} \right)}^{- 1}\left\lbrack \left( R_{t}^{upper} \right)^{T} \right\rbrack}^{- 1}d}.}}}} & (11)\end{matrix}$To implement Eq. (11), five modules are required:

-   Module 1: Array of CORDIC circuits (101, 102, 103) shown in FIG. 1    where the pixel stream is fed into the module and the upper    triangular matrix R_(t) ^(upper) is updated in real-time.-   Module 2: Apply backsubstitution to obtain the inverse of R_(t)    ^(upper) invR.-   Module 3: Apply distributed arithmetic to calculate c=[(R_(t)    ^(upper))^(T)]⁻¹d=invR^(T)*d.-   Module 4: Compute w=invR^(T)*c.-   Module 5: The filter output energy can be obtained by applying a FIR    filter to the current input pixel streams.    The detailed implementation for each of five modules is described as    follows.

In Module 1, a set of CORDIC circuits (101, 102, 103) are applied toperform Givens rotation, as shown in FIG. 1. Values of r in FIG. 1 areinput pixel values, and values of {tilde over (r)} are processed pixelvalues. For demonstrative purpose, assume L=3. First, two pixel streams(row 1 and row 2) are fed into the CORDIC circuit (101). As a result,the first zero, which will occupy the r₂₁ position, is introduced by aGivens Rotation. The first pair (r₁₁, r₁₂) is the leading pair, itdecides the angle needed to be rotated, and the other pairs arefollowers which will be rotated by the same angle accordingly. Then, thesecond zero, which will occupy the r₃₁ position, is introduced by thesecond CORDIC circuit (102) that operates on the row 1 and row 2. Thethird zero, which will occupy the r₃₂ position, is introduced by aCORDIC circuit (103) which operates on row 2 and 3. Finally, the outputbecomes an upper triangular matrix.

In Module 2, backsubstitution is applied to obtain the inverse of anupper triangular matrix R_(t) ^(upper), (R_(t) ^(upper))⁻¹. For anillustrative purpose, assume that upper triangular matrix is

$R_{t}^{upper} = {\begin{bmatrix}{\overset{\sim}{r}}_{11} & {\overset{\sim}{r}}_{12} & {\overset{\sim}{r}}_{13} \\0 & {\overset{\sim}{r}}_{22} & {\overset{\sim}{r}}_{23} \\0 & 0 & {\overset{\sim}{r}}_{33}\end{bmatrix}.}$Its inverse is also an upper triangular matrix. If we let

$\begin{matrix}{\left( R_{t}^{upper} \right)^{- 1} = {\begin{bmatrix}a_{11} & a_{12} & a_{13} \\0 & a_{22} & a_{23} \\0 & 0 & a_{33}\end{bmatrix} = \left\lbrack \begin{matrix}a_{1} & a_{2} & {\left. a_{3} \right\rbrack,}\end{matrix} \right.}} & (12) \\{then} & \; \\{R_{t}^{upper}\left\lbrack {\begin{matrix}a_{1} & a_{2} & {\left. a_{3} \right\rbrack = {I = \begin{bmatrix}1 & 0 & 0 \\0 & 1 & 0 \\0 & 0 & 1\end{bmatrix}}}\end{matrix} = \left\lbrack {\begin{matrix}b_{1} & b_{2} & {\left. b_{3} \right\rbrack,}\end{matrix}{and}} \right.} \right.} & (13)\end{matrix}$it can be calculated viaR _(t) ^(upper) a _(k) =b _(k)  (14)where the upper triangular matrix R_(t) ^(upper) and the vector b_(k)are known and the vector a_(k) needs to be computed. Using Eq. (14), thebacksubstitution can be described by the following recursive equation

$\begin{matrix}{a_{kj} = \frac{\left( {b_{kj} - {\sum\limits_{j = {i + 1}}^{n}\;{r_{ij}a_{kj}}}} \right)}{r_{ii}}} & (15)\end{matrix}$with internal cells performing the summation given byΣ_(j=i+1) ^(n) r _(ij) a _(kj)  (16)and boundary cells completing the calculation. The architecture of thebacksubstitution array is shown in FIG. 2, with boundary cell B-Cell 1(201) and internal cells I-Cell 1 (202) and I-Cell 2 (203). Theimplementation of a boundary cell (300) is shown in FIG. 3( a) and theimplementation of an internal cell (310) is shown in FIG. 3( b).

In Module 3, letting c=(c₁, . . . , c_(L))^(T) and

${{{inv}\; R^{T}} = \begin{bmatrix}v_{11} & 0 & 0 & \ldots & 0 \\v_{12} & v_{22} & 0 & \ldots & 0 \\v_{13} & v_{23} & v_{33} & \cdots & 0 \\\vdots & \vdots & \vdots & ⋰ & \vdots \\v_{1L} & v_{2L} & v_{3L} & \ldots & v_{LL}\end{bmatrix}},$then c=invR^(T)*d can be represented as the inner product of two vectorsas follows:c _(i)=Σ_(k=1) ^(L) v _(ki) d _(k).  (17)

Because the desired signature d_(k) is known a priori, the termv_(ki)d_(k) is simply a multiplication with a constant. In this case, adistributed arithmetic (DA) architecture widely used in FPGA technologycan be used to implement this module.

Assume that a B-bit system is used. The variable v_(ki) can berepresented by:

$\begin{matrix}{v_{ki} = {{\sum\limits_{b = 0}^{B - 1}{{v(b)}*2^{b}\mspace{14mu}{with}\mspace{20mu}{v(b)}}} \in \left\lbrack {0,1} \right\rbrack}} & (18)\end{matrix}$where v(b) denotes the b^(th) bit of v_(ki). Therefore, the innerproduct in (17) can be represented asc _(i)=Σ_(k=1) ^(L) d _(k)*Σ_(b=0) ^(B−1)(v(b)*2^(b)).  (19)Re-distributing the order of summation results inc _(i)=Σ_(b=0) ^(B−1)2^(b)*Σ_(k=1) ^(L)(d _(k) *v _(b)(k))=Σ_(b=0)^(B−1)2^(b)*Σ_(k=1) ^(L)ƒ(d _(k) ,v _(b)(k)).  (20)

Implementation of the function ƒ(d_(k), v_(b)(k)) in Eq. (20) can berealized by a LUT. That is, a LUT is preprogrammed to accept an L-bitinput vector v_(b)=(v_(b)(0), . . . , v_(b)(1), v_(b)(L−1))^(T), andoutput ƒ(d_(k), v_(b)(k)). The i weighted by an appropriate power-of-twofactor and accumulated. The accumulation can be efficiently implementedusing a shift-adder (400) as shown in FIG. 4. After L look-up cycles,the inner product c_(i) is computed at output (410).

In Module 4, a weight vector is generated. The inverse of R_(t)^(upper), invR is an upper triangular matrix, and can be represented as

$\begin{matrix}{{invR}_{t}^{upper} = \begin{bmatrix}v_{11} & v_{12} & \ldots & v_{1L} \\v_{21} & v_{22} & \ldots & v_{2L} \\\vdots & \vdots & ⋰ & \vdots \\v_{L1} & v_{L2} & \ldots & v_{LL}\end{bmatrix}} & (21)\end{matrix}$where v_(ij)=0 if i>j. The purpose of this module is to computew=invR_(t) ^(upper)*c.

Suppose w=(w₁, w₂, . . . , w_(L))^(T) and c=(c₁, c₂, . . . , c_(L))^(T).The kth element of w can be obtained by

$\begin{matrix}{w_{k} = {\sum\limits_{m = 1}^{L}{c_{m}v_{km}}}} & (22)\end{matrix}$with its computation circuit (500) shown in FIG. 5.

In Module 5, operation is shown in FIG. 6, delineating a FIR filter(600) to produce estimated abundance fractions, determining filteroutput energy.

In a second embodiment of the invention for CEM implementation, thecorrelation matrix is calculated before QR-decomposition shown in FIG.6. Since d^(T)R_(L×L) ⁻¹ in Eq. (5) is a constant, R_(L×L) ⁻¹drepresents relative strength of detection power. Letting w=R_(L×L) ⁻¹d,i.e. R_(L×L)w=d, where R_(L×L) ⁻¹d can be implemented by a CORDIC moduleand followed by a backsubstitution module. As a result, four modules arerequired to implement this modified CEM detector:

-   Module 1: Auto-correlator.-   Module 2: Apply CORDIC circuits to triangularize [R|d].-   Module 3: Apply backsubstitution to obtain w.-   Module 4: The filter output energy δ_(CEM)(r)=w^(T)r can be obtained    by applying a FIR filter to the current input pixel streams.

In Module 1, an auto-correlator generates the correlation matrixR_(L×L)(i) with R_(L×L)(i)=R_(L×L)(i−1)+r_(i)r_(i) ^(T) and apixel-by-pixel update process. In other words, the correlation matrix isupdated every time a new pixel arrives. For illustrative purpose, let usassume L=3, then the new correlation matrix can be obtained by

$\begin{matrix}{{\begin{bmatrix}R_{11} & R_{12} & R_{13} \\R_{21} & R_{22} & R_{23} \\R_{31} & R_{32} & R_{33}\end{bmatrix}(i)} = {{\begin{bmatrix}R_{11} & R_{12} & R_{13} \\R_{21} & R_{22} & R_{23} \\R_{31} & R_{32} & R_{33}\end{bmatrix}\left( {i - 1} \right)} + {{\begin{bmatrix}r_{i1} \\r_{i2} \\r_{i3}\end{bmatrix}\begin{bmatrix}r_{i1} & r_{i2} & r_{i3}\end{bmatrix}}.}}} & (23)\end{matrix}$The implementation of Eq. (23) is shown in auto-correlator (700) of FIG.7.

In Module 2, QR-decomposition of the correlation matrix by CORDICcircuit is accomplished. The purpose of this module is to triangularizethe matrix [R|d], i.e. to convert the matrix

$\left\lbrack \begin{matrix}R_{11} & R_{12} & R_{13} \\R_{21} & R_{22} & R_{23} \\R_{31} & R_{32} & R_{33}\end{matrix} \middle| \begin{matrix}\begin{matrix}d_{1} \\d_{2}\end{matrix} \\d_{3}\end{matrix} \right\rbrack$to an upper triangular matrix

$\left\lbrack \begin{matrix}{\overset{\sim}{R}}_{11} & {\overset{\sim}{R}}_{12} & {\overset{\sim}{R}}_{13} \\0 & {\overset{\sim}{R}}_{22} & {\overset{\sim}{R}}_{23} \\0 & 0 & {\overset{\sim}{R}}_{33}\end{matrix} \middle| \begin{matrix}\begin{matrix}{\overset{\sim}{d}}_{1} \\{\overset{\sim}{d}}_{2}\end{matrix} \\{\overset{\sim}{d}}_{3}\end{matrix} \right\rbrack$by Givens Rotation. Similar to the first embodiment, the module isimplemented by a set of CORDIC circuits, the difference being that thepixel stream is fed directly to the circuit for the first embodiment,while in this second embodiment, both the elements in the correlationmatrix and the signatures of the desired target are fed, for example,the first data stream feeding into the CORDIC circuit (801) is R₁₁first, followed by R₁₂, and then R₁₃, the last one is d₁, as depicted inFIG. 8. Values of R in FIG. 8 are input pixel values, and values of{tilde over (R)} are processed pixel values.

As soon as the first CORDIC (801) in the chain has been fed by all thedata it needs for row one, it is free to accept and process data for thenew row. The second CORDIC (802) can begin its work on row 2 of thetriangular matrix as soon as the first CORDIC (801) has finished itswork on the second element pair. The third CORDIC (803) can begin itswork as soon as the second CORDIC (802) has finished its work, and soon.

In Module 3, backsubstitution is applied to obtain the CEM filtercoefficients w with w=R_(L×L) ⁻¹d={tilde over (R)}_(L×L) ⁻¹{tilde over(d)} where {tilde over (R)}_(L×L) is an upper triangular matrix. For anillustrative purpose, assume that

${\overset{\sim}{R}}_{L \times L} = {\begin{bmatrix}{\overset{\sim}{r}}_{11} & {\overset{\sim}{r}}_{12} & {\overset{\sim}{r}}_{13} \\0 & {\overset{\sim}{r}}_{22} & {\overset{\sim}{r}}_{23} \\0 & 0 & {\overset{\sim}{r}}_{33}\end{bmatrix}.}$Letting the weight be w=(w₁, w₂, w₃)^(T), it can be calculated via{tilde over (R)} _(L×L) w=d  (24)where the upper triangular matrix {tilde over (R)}_(L×L) and the vectord are known and the vector w needs to be computed. Using Eq. (24), thebacksubstitution can be described by the following recursive equation:

$\begin{matrix}{w_{i} = \frac{\left( {{\overset{\sim}{d}}_{i} - {\sum\limits_{j = {i + 1}}^{l}{\overset{\sim}{r}}_{ij}} - w_{j}} \right)}{r_{ii}}} & (25)\end{matrix}$with internal cells performing the summation given byΣ_(j=i+1) ^(n) r _(ij) a _(kj)  (26)and boundary cells completing the calculation. The architecture of thebacksubstitution array is shown in FIG. 9, with boundary cell B-Cell 1(901) and internal cells I-Cell 1 (902) and I-Cell 2 (903). Theimplementation of a boundary cell (1000) is depicted in FIG. 10( a) andthe implementation of an internal cell (1010) is depicted in FIG. 10(b).

In Module 4, analogous to the first embodiment, the FIR filter (600) mayalso operate to produce estimated abundance fractions, determiningfilter output energy.

Simulation results using architecture of the first embodiment will nowbe presented. Here, Givens rotation is used to simulate the inventivedesign. The code is written in C and use the floating-point operation.As shown in FIG. 11, the output of CORDIC circuit is updated inreal-time, i.e., the upper triangular matrix receives more informationas a new line is fed to the circuit.

FIG. 11( a)-(h) are the results after the first 25, 50, 75, 100, 175 and200 lines of pixels streams are fed into the CORDIC circuits. In themean time, the output of the weight-generated circuit also progressivelyupdates its weight, as depicted in FIG. 12. Additionally, the estimatesof the FIR filter coefficients become more accurate and approximatelyapproach the desired FIR coefficients as more pixels are fed to thecircuit. As demonstrated in FIG. 12( a)-(h), the weights generated after100 lines are very close to the ones generated by using the complete setof pixels streams. FIG. 13 shows real-time detection results from thedesired FIR filter, where FIG. 13( a)-(h) are the detection results ofthe first 25 lines, first 50 lines, and so on, until all 200 lines arecompleted.

Two embodiments have been discussed for computing the input stream inCEM. The first embodiment computes the input stream from image pixelvectors directly, while the second embodiment computes the samplecorrelation matrix R.

The first embodiment requires five modules. The first module is designedfrom an array of CORDIC circuits where the pixel stream is fed into themodule and the upper triangular matrix R_(t) ^(upper) is updated inreal-time. This is followed by the second module which appliesbacksubstitution to compute the inverse of R_(t) ^(upper), invR. ThenModule 3 uses a distributed arithmetic to calculate c=[(R_(t)^(upper))^(T)]⁻¹d=invR^(T)*d where the d is the desired targetsignature. Next, Module 4 is developed to obtain the desired filtervector w by finding w=invR*c . Finally, Module 5 produces the results byapplying a FIR filter to the current input pixel streams.

The second embodiment takes an alternative approach by first computingthe auto-correlation matrix R. Four modules are required for thisembodiment. Module 1 is an auto-correlator that calculates the samplecorrelation matrix R. It is then followed by Module 2, which uses theCORDIC circuits to triangularize [R|d]. Next, Module 3 appliesbacksubstitution to obtain the desired filter vector w. Finally, Module4 produces the filter output energy δ_(CEM)(r)=w^(T)r for targetdetection by applying a FIR filter to the current input pixel streams.FIG. 14( a) and FIG. 14( b) depict block diagrams of the first andsecond embodiments, respectively, to be used for FPGA design of the CEM.

Focus has been placed on real-time implementation of the CEM. Numerousother embodiments utilizing CORDIC for subpixel detection and mixedpixel classification approaches are also possible. These approachesinclude, but are not limited to Orthogonal Subspace Projection (OSP),causal RX algorithm-based anomaly detector (CRXD), and an unsupervisedKalman Filter-based Linear Unmixing (UKFLU). In OSP, computation of thepseudo-inverse of a matrix is required, similar to CEM's computation ofthe inverse of the sample correlation matrix. OSP also requires completetarget knowledge for subpixel detection and multipixel classification.FIG. 15 depicts a block diagram of four modules used for FPGA design ofOSP. In a first module, a systolic array is designed for a matrixmultiplication circuit. A second module applies Faddeeva algorithm tocalculate the undesired target signature rejector P_(U) ^(⊥), which maybe implemented by CORDIC circuits. A third module obtains filtercoefficients by a distributed arithmetic circuit. A fourth modulegenerates the detection/classification results by a FIR filter.

In CRXD, a real-time processing RX detector is utilized for anomalydetection for multispectral and hyperspectral images. The CRXD processand updates data either line-by-line or sample-by sample, and may alsoimplement anomaly detection in real-time. An FPGA design of CRXDcomprising four modules is shown in FIG. 16. Module 1 is an array ofCORDIC circuits which process the input pixel stream, outputting anupper triangular matrix. Module 2 is a systolic array computing theinverse of the upper triangular matrix by the backsubstitutionalgorithm. Module 3 is designed to generate a desired set of filtercoefficients. Module 4 is a FIR filter formed by the filter coefficientsobtained in Module 3, producing detection and classification results.

Alternatively, in another embodiment, UKFLU provides signatureestimation for remotely sensed images. UKFLU takes into accountinter-pixel spatial correlation. An embodiment of a stationary KalmanFilter for UKFLU is shown in FIG. 17( a), and an embodiment of a dynamicKalman Filter for UKFLU is shown in FIG. 17( b).

Embodiments of the present invention are not limited to implementationin field programmable gate arrays (FPGA). Further embodiments of thepresent invention may utilize other hardware structures, including butnot limited to Application-Specific Integrated Circuits (ASIC), adaptivearray processing and passive array processing. Software processes mayalso be utilized in implementing embodiments of the present invention.Application of embodiments of the present invention is wide ranging,also including but not limited to beamforming and other filteringtechniques, and for use in satellite communication. It is contemplatedthat numerous modifications may be made to the embodiments andimplementations of the present invention without departing from thespirit and scope of the invention as defined in the following claims.

1. A method of implementing field programmable gate arrays in imageprocessing and detection, comprising: converting a Givens rotation of avector to a set of operations; performing the set of operations on inputpixel streams to generate an upper triangular matrix; applyingbacksubstitution to said generated upper triangular matrix to obtain aninverse of said upper triangular matrix; transposing said inverse ofsaid upper triangular matrix using distributed arithmetic; obtaining afilter weight vector; and applying finite impulse response filtering tothe input pixel streams to generate an output; wherein a coordinaterotation digital computer operation implemented in the fieldprogrammable gate arrays is used for said converting; and said generatedoutput indicates a detected target signature.
 2. A method ofimplementing field programmable gate arrays in image processing anddetection, comprising: converting a Givens rotation of a vector to a setof operations; generating a correlation matrix R from input pixelstreams; applying said set of operations to convert a matrix [R|d] to anupper triangular matrix by Givens rotation, where d is a desired targetsignature; applying backsubstitution to said upper triangular matrix toobtain filter coefficients; and applying finite impulse responsefiltering to the input pixel streams to generate an output; wherein acoordinate rotation digital computer operation implemented in the fieldprogrammable gate arrays is used for said converting; and saidcorrelation matrix is updated every time a new pixel arrives, and saidgenerated output indicates a detected target signature.
 3. The methodimplementing field programmable gate arrays in image processing anddetection, comprising: converting a Givens rotation of a vector to a setof operations; applying said set of operations to received input pixelstreams to generate an upper triangular matrix; applyingbacksubstitution to said generated upper triangular matrix to obtain aninverse of said upper triangular matrix; and applying finite impulseresponse filtering to the input pixel streams to generate an output;wherein a coordinate rotation digital computer operation implemented inthe field programmable gate arrays is used for said converting; and saidgenerated output indicates a detected target signature.
 4. A method ofimplementing field programmable gate arrays for hyperspectral detectionand classification, comprising: applying a coordinate rotation digitalcomputer operation to received input pixel streams to generate an uppertriangular matrix by Givens rotation; applying backsubstitution to saidgenerated upper triangular matrix to obtain an inverse of said uppertriangular matrix; and applying finite impulse response filtering to theinput pixel streams to generate an output; wherein said generated outputindicates a detected target signature.
 5. The method of claim 4, furthercomprising: transposing said inverse of said upper triangular matrixusing distributed arithmetic; and obtaining a filter weight vector;wherein coefficients of said finite impulse response filtering aredefined by said filter weight vector.
 6. The method of claim 4, furthercomprising generating a correlation matrix R from an input pixel stream,wherein said received data is a matrix [R|d], where d is a desiredtarget signature.
 7. A method of implementing field programmable gatearrays for hyperspectral detection and classification, comprising:performing Givens rotation on the input pixel streams to generate anupper triangular matrix; applying backsubstitution to said generatedupper triangular matrix to obtain an inverse of said upper triangularmatrix; transposing said inverse of said upper triangular matrix usingdistributed arithmetic; obtaining a filter weight vector; and applyingfinite impulse response filtering to the input pixel streams to generatean output; wherein a coordinate rotation digital computer operation isused to perform the Givens rotation by a series of operations, and saidgenerated output indicates a detected target signature.
 8. The method ofclaim 7, wherein coefficients of said finite impulse response filteringare defined by said filter weight vector.
 9. The method of claim 7,wherein said finite impulse response filter is${w^{*} = \frac{{\left( R_{t}^{upper} \right)^{- 1}\left\lbrack \left( R_{t}^{upper} \right)^{T} \right\rbrack}^{- 1}d}{{{d^{T}\left( R_{t}^{upper} \right)}^{- 1}\left\lbrack \left( R_{t}^{upper} \right)^{T} \right\rbrack}^{- 1}d}},$specified by an optimal weight vector w* such that where, R_(t) ^(upper)is said upper triangular matrix and d is a desired target signature. 10.The method of claim 7, wherein said target signature is detected withoutknowledge of a target image background.
 11. The method of claim 7,wherein said series of operations comprise shifts-adds.
 12. The methodof claim 7, wherein said series of operations comprise adds, ORs, XORs,and shifts.
 13. The method of claim 7, wherein said upper triangularmatrix is updated every time a new pixel is received from said inputpixel stream.
 14. The method of claim 7, wherein the field programmablegate arrays implement a constrained energy minimization operation.
 15. Aconstrained energy minimization implementation apparatus of fieldprogrammable gate arrays, comprising: a first module generating an uppertriangular matrix based on input pixel streams; a second module applyingbacksubstitution to said generated upper triangular matrix to obtain aninverse of said upper triangular matrix; a third module transposing saidinverse of said upper triangular matrix using distributed arithmetic; afourth module obtaining a filter weight vector; and a fifth modulefiltering the input pixel streams to generate a filter output energy;wherein said first module further comprises a plurality of coordinaterotation digital computer operation circuits performing Givens rotationon the input pixel streams.
 16. The apparatus of claim 15, wherein saidgenerated filter output energy indicates a detected target signature.17. The apparatus of claim 15, wherein said fifth module furthercomprises a finite impulse response filter.
 18. The apparatus of claim17, wherein coefficients of said finite impulse response filter aredefined by said filter weight vector.
 19. A method of implementing fieldprogrammable gate arrays for hyperspectral detection and classification,comprising: generating a correlation matrix R from input pixel streams;applying a coordinate rotation digital computer algorithm to convert amatrix [R|d] to an upper triangular matrix by Givens rotation, where dis a desired target signature; applying backsubstitution to said uppertriangular matrix to obtain filter coefficients; and applying finiteimpulse response filtering to the input pixel streams to generate anoutput; wherein said correlation matrix is updated every time a newpixel arrives, and said generated output indicates a detected targetsignature.
 20. The method of claim 19, wherein said generated outputcomprises a filter output energy δ_(CEM)(r)=w^(T)r, where w is a weightvector, r is a vector with L dimensions, and L is a number of spectralbands.
 21. The method of claim 19, wherein said target signature isdetected without knowledge of a target image background.
 22. The methodof claim 19, wherein the field programmable gate arrays implement aconstrained energy minimization operation.
 23. A constrained energyminimization implementation apparatus of field programmable gate arrays,comprising: an autocorrelator generating a correlation matrix R frominput pixel streams; a plurality of coordinate rotation digital computercircuits converting a matrix [R|d] to an upper triangular matrix byGivens rotation, where d is a desired target signature; abacksubstitution module performing on said upper triangular matrix toobtain filter coefficients; and an output module filtering the inputpixel streams to generate an output; wherein said correlation matrix isupdated every time a new pixel arrives, and said generated outputindicates a detected target signature.
 24. The method of claim 23,wherein said output module comprises a finite impulse response filter.